37 research outputs found

    Pencarian Informasi Menggunakan Model Terdekomposisi: Aplikasi Pada Riset Penemuan Antibiotik

    Get PDF
    Penemuan obat-obatan antibiotik adalah salah satu tantangan pada bidang kemoinformatika. Dibutuhkan antibiotik baru secara cepat dan efektif karena banyak bakteri menjadi kebal terhadap antibiotik lama. Molekul-molekul kimia yang tersimpan di beberapa perusahaan dan laboratorium menyediakan kandidat yang berpotensi sebagai antibiotik baru. Tetapi, terlalu banyak kandidat yang harus diteliti. Untuk mengatasinya, dibutuhkan pencarian informasi yang dapat mendeteksi kandidat-kandidat penting melalui atribut mereka. Jumlah atribut tersebut sangatlah besar. Tujuan penelitian ini adalah mempelajari atribut-atribut tersebut dan menentukan atribut yang penting, dengan kata lain, untuk mereduksi dimensi data molekul. Fokus penelitian ini ditujukan pada molekul-molekul antibiotik yang sudah ada di pasaran, dengan sekitar 500 atribut yang diperoleh dari penelitian sebelumnya. Sebagai prosedur seleksi fitur, penelitian ini menggunakan analisis log-linear untuk menemukan asosiasi di antara atribut. Karena jumlah atribut mencapai ratusan, maka digunakan Chordalysis yang bekerja pada model log-linear yang bisa didekomposisi. Penelitian ini menemukan bahwa atribut-atribut dari penelitian sebelumnya memiliki beberapa asosiasi. Dengan demikian, beberapa atribut yang redundan dapat dieliminasi. ================================================================================================================== Antibacterial drug discovery is one of the emerging challenges in chemoinformatics. There is an urgent need for finding new effective drugs faster because many bacteria become resistant to the old drugs. The chemical molecules stored in companies and laboratories’ databases provide potential candidates for developing new drugs. However, there are far too many candidates to investigate. This is where knowledge discovery could be of help, by sifting through the known properties of the molecule to select the most promising candidates for further experiments. The number of properties, or descriptors, characterizing the molecules is rather large. The aim of the present work is to study these descriptors and find out which ones really matter, that is, to reduce the dimension of the description space. We focus on a subset of antibacterial molecules already on the market, with around 500 descriptors obtained from the selection process in the previous work. As our feature selection procedure, we use log-linear analysis (LLA) to discover associations among descriptors. Given that the number of descriptors is high, we study Chordalysis that focuses on a specific subset of log-linear models: decomposable models. We find that the selected descriptors from the previous work still have many associations among them. Therefore, a number of redundant descriptors can still be left out

    PENGGALIAN INFORMASI MENGGUNAKAN MODEL TERDEKOMPOSISI: APLIKASI PADA RISET PENEMUAN ANTIBIOTIK

    Get PDF
    Penemuan obat-obatan antibiotik adalah salah satu tantangan pada bidang kemoinformatika. Dibutuhkan antibiotik baru secara cepat dan efektif karena banyak bakteri menjadi kebal terhadap antibiotik lama. Molekul-molekul kimia yang tersimpan di beberapa perusahaan dan laboratorium menyediakan kandidat yang berpotensi sebagai antibiotik baru. Tetapi, terlalu banyak kandidat yang harus diteliti. Untuk mengatasinya, dibutuhkan pencarian informasi yang dapat mendeteksi kandidat-kandidat penting melalui atribut mereka. Jumlah atribut tersebut sangatlah besar. Tujuan penelitian ini adalah mempelajari atribut-atribut tersebut dan menentukan atribut yang penting, dengan kata lain, untuk mereduksi dimensi data molekul. Fokus penelitian ini ditujukan pada molekul-molekul antibiotik yang sudah ada di pasaran, dengan sekitar 500 atribut yang diperoleh dari penelitian sebelumnya. Sebagai prosedur seleksi fitur, penelitian ini menggunakan analisis log-linear untuk menemukan asosiasi di antara atribut. Karena jumlah atribut mencapai ratusan, maka digunakan Chordalysis yang bekerja pada model log-linear yang bisa didekomposisi. Penelitian ini menemukan bahwa atribut-atribut dari penelitian sebelumnya memiliki beberapa asosiasi. Dengan demikian, beberapa atribut yang redundan dapat dieliminasi

    Fouille de données complexes et biclustering avec l'analyse formelle de concepts

    Get PDF
    Knowledge discovery in database (KDD) is a process which is applied to possibly large volumes of data for discovering patterns which can be significant and useful. In this thesis, we are interested in data transformation and data mining in knowledge discovery applied to complex data, and we present several experiments related to different approaches and different data types.The first part of this thesis focuses on the task of biclustering using formal concept analysis (FCA) and pattern structures. FCA is naturally related to biclustering, where the objective is to simultaneously group rows and columns which verify some regularities. Related to FCA, pattern structures are its generalizations which work on more complex data. Partition pattern structures were proposed to discover constant-column biclustering, while interval pattern structures were studied in similar-column biclustering. Here we extend these approaches to enumerate other types of biclusters: additive, multiplicative, order-preserving, and coherent-sign-changes.The second part of this thesis focuses on two experiments in mining complex data. First, we present a contribution related to the CrossCult project, where we analyze a dataset of visitor trajectories in a museum. We apply sequence clustering and FCA-based sequential pattern mining to discover patterns in the dataset and to classify these trajectories. This analysis can be used within CrossCult project to build recommendation systems for future visitors. Second, we present our work related to the task of antibacterial drug discovery. The dataset for this task is generally a numerical matrix with molecules as rows and features/attributes as columns. The huge number of features makes it more complex for any classifier to perform molecule classification. Here we study a feature selection approach based on log-linear analysis which discovers associations among features.As a synthesis, this thesis presents a series of different experiments in the mining of complex real-world data.L'extraction de connaissances dans les bases de données (ECBD) est un processus qui s'applique à de (potentiellement larges) volumes de données pour découvrir des motifs qui peuvent être signifiants et utiles. Dans cette thèse, on s'intéresse à deux étapes du processus d'ECBD, la transformation et la fouille, que nous appliquons à des données complexes. Nous présentons de nombreuses expérimentations s'appuyant sur des approches et des types de données variés.La première partie de cette thèse s'intéresse à la tâche de biclustering en s'appuyant sur l'analyse formelle de concepts (FCA) et aux pattern structures. FCA est naturellement liées au biclustering, dont l'objectif consiste à grouper simultanément un ensemble de lignes et de colonnes qui vérifient certaines régularités. Les pattern structures sont une généralisation de la FCA qui permet de travailler avec des données plus complexes. Les "partition pattern structures'' ont été proposées pour du biclustering à colonnes constantes tandis que les "interval pattern structures'' ont été étudiées pour du biclustering à colonnes similaires. Nous proposons ici d'étendre ces approches afin d'énumérer d'autres types de biclusters : additif, multiplicatif, préservant l'ordre, et changement de signes cohérents.Dans la seconde partie, nous nous intéressons à deux expériences de fouille de données complexes. Premièrement, nous présentons une contribution dans la quelle nous analysons les trajectoires des visiteurs d'un musée dans le cadre du projet CrossCult. Nous utilisons du clustering de séquences et de la fouille de motifs séquentiels basée sur l'analyse formelle de concepts pour découvrir des motifs dans les données et classifier les trajectoires. Cette analyse peut ensuite être exploitée par un système de recommandation pour les futurs visiteurs. Deuxièmement, nous présentons un travail sur la découverte de médicaments antibactériens. Les jeux de données pour cette tâche, généralement des matrices numériques, décrivent des molécules par un certain nombre de variables/attributs. Le grand nombre de variables complexifie la classification des molécules par les classifieurs. Ici, nous étudions une approche de sélection de variables basée sur l'analyse log-linéaire qui découvre des associations entre variables.En somme, cette thèse présente différentes expériences de fouille de données réelles et complexes

    Application of sequential pattern mining to the analysis of visitor trajectories

    Get PDF
    International audienceIn this work, we demonstrate the proof of concept of clustering 254 visitors based on their trajectories in a museum. We used a real dataset from Haifa Museum, where each trajectory is treated as a sequence of itemsets. We applied simACS as a similarity measure between any two sequences

    Application des Pattern Structures à la découverte de biclusters à changements de signes cohérents

    Get PDF
    National audienceLe "biclustering" joue un rôle majeur dans beaucoup d'applications du monde réel. Il est lié au "clustering" qui regroupe des lignes similaires dans une matrice de données numériques, tandis que le biclustering cherche à re-grouper simultanément des lignes et colonnes similaires, c'est-à-dire trouver des sous-matrices où émerge une corrélation entre les entrées. Le biclustering s'ap-puie sur un critère de similarité, et dans cet article, nous nous intéressons au biclustering "à colonnes constantes" (CC), où les valeurs numériques dans les colonnes des sous-matrices sont constantes pour chaque ligne. L'étude est en-suite étendue au biclustering à "changements de signes cohérents" (CSC), où la différence entre les valeurs de deux colonnes consécutives de la sous-matrice est du même signe pour chaque ligne

    Formal Concept Analysis for Identifying Biclusters with Coherent Sign Changes

    Get PDF
    In this paper we are studying the task of finding coherent-sign-changes biclusters in a binary matrix. This task can be applied to the interpretation of gene expression data, where such a bicluster represents a set of experiments that affect a set of genes in a consistent way. We start with a binary table and study biclustering methods based on FCA and partition pattern structures. Pattern concepts provide biclusters and their hierarchical relation, which can be used to analyze the profile of genes in the given expression data. Our approach is purely symbolic, so we can detect larger biclusters and work with rather complex data

    The Use of Picture Storybooks in Blended-based Learning Method to Teach Literacy to Young Learners

    Get PDF
    This study aims to investigate the implementation of picture storybooks integrated with the blended learning method through LMS for young learners in fourth-grade students in English literacy learning. In this study, the researchers used an experimental design. The research chosen by the researchers is a post-test-only control group design. Document analysis and statistical analysis of students’ score was used to compare their literacy skills. The research sample was 70 fourth-grade students at SDN 3 Banjar Jawa Singaraja. The results showed that there were significant differences in students’ score while they learned English literacy using picture storybooks integrated with the blended-based learning method. The data obtained then tested using the independent sample t-test, the result showed sig. (2-tailed) is 0.011, which the observed significant level (sig. 2-tailed) is smaller than the standard alpha level (α=0.05). From the result obtained also showed that the use of picture storybooks integrated with the blended learning methods can help students learn in English literacy teaching. Therefore, the use of picture storybooks integrated with the blended-based learning methods is highly recommended because it has a significant influence on English literacy learning for young learners, besides it can also create fun and effective learning

    Biclustering Based on FCA and Partition Pattern Structures for Recommendation Systems

    Get PDF
    International audienceThis paper focuses on item recommendation for visitors in a museum within the framework of European Project CrossCult about cultural heritage. We present a theoretical research work about recommendation using biclustering. Our approach is based on biclustering using FCA and partition pattern structures. First, we recall a previous method of recommendation based on constant-column biclusters. Then, we propose an alternative approach that incorporates an order information and that uses coherent-evolution-on-columns biclusters. This alternative approach shares some common features with sequential pattern mining. Finally, given a dataset of visitor trajectories, we indicate how these approaches can be used to build a collaborative recommendation strategy

    Sequential Pattern Mining using FCA and Pattern Structures for Analyzing Visitor Trajectories in a Museum

    Get PDF
    International audienceThis paper presents our work on mining visitor trajectories in Hecht Museum (Haifa, Israel), within the framework of CrossCult Eu-ropean Project about cultural heritage. We present a theoretical and practical research work about the characterization of visitor trajectories and the mining of these trajectories as sequences. The mining process is based on two approaches in the framework of FCA, namely the mining of subsequences without any constraint and the mining of frequent contiguous subsequences. Both approaches are based on pattern structures. In parallel, a similarity measure allows us to build a hierarchical classification which is used for interpretation and characterization of the trajectories w.r.t. four well-known visiting styles
    corecore